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Abstract. XML access control policies involving updates may contain security 
flaws, here called inconsistencies, in which a forbidden operation may be sim- 
ulated by performing a sequence of allowed operations. This paper investigates 
the problem of deciding whether a policy is consistent, and if not, how its incon- 
sistencies can be repaired. We consider policies expressed in terms of annotated 
DTDs defining which operations are allowed or denied for the XML trees that 
are instances of the DTD. We show that consistency is decidable in PTIME for 
such policies and that consistent partial policies can be extended to unique "least- 
privilege" consistent total policies. We also consider repair problems based on 
deleting privileges to restore consistency, show that finding minimal repairs is 
NP-complete, and give heuristics for finding repairs. 

1 Introduction 

Discretionary access control policies for database systems can be specified in a number 
of different ways, for example by storing access control lists as annotations on the data 
itself (as in most file systems), or using rules which can be applied to decide whether to 
grant access to protected resources. In relational databases, high-level policies that em- 
ploy rules, roles, and other abstractions tend to be much easier to understand and main- 
tain than access control list-based policies; also, they can be implemented efficiently 
using static techniques, and can be analyzed off-line for security vulnerabilities [6]. 

Rule-based, fine-grained access control techniques for XML data have been consid- 
ered extensively for read-only queries [10, 14, 13, 12, 2, 16, 9]. However, the problem of 
controlling write access is relatively new and has not received much attention. Authors 
in [2, 9, 15] studied enforcement of write-access control policies following annotation- 
based approaches. 

In this paper, we build upon the schema-based access control model introduced 
by Stoica and Farkas [18], refined by Fan, Chan, and Garofalakis [10], and extended 
to write-access control by Fundulaki and Maneth [12]. We investigate the problem of 
checking for, and repairing, a particular class of vulnerabilities in XML write-access 
control policies. An access control policy specifies which actions to allow a user to 
perform based on the syntax of the atomic update, not its actual behavior. Thus, it is 
possible that a single-step action which is explicitly forbidden by the policy can nev- 
ertheless be simulated by one or more allowed actions. This is what we mean by an 
inconsistency; a consistent policy is one in which such inconsistencies are not possible. 
We believe inconsistencies are an interesting class of policy-level security vulnerabili- 
ties since such policies allow users to circumvent the intended effect of the policy. The 
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Fig. 1. DTD graph (a) and XML documents conforming to the DTD (b, c) 



purpose of this paper is to define consistency, understand how to determine whether a 
policy is consistent, and show how to automatically identify possible repairs for incon- 
sistent policies. 

Motivating Example: We introduce here an example and refer to it throughout the pa- 
per Consider the XML DTD represented as a graph in Fig. 1(a). A document conform- 
ing to this DTD has as root an i?-element with a single child element that can either be 
an A, B, J or /^-element (indicated with dashed edges); similarly for G. An A-element 
has one C and one D children elements. A B-element can have zero or more E children 
elements (indicated with H<-labeled edges); similarly, E and J elements can have zero 
or more G children elements. Finally, F, H, I and K are text elements. Fig. 1(b) and 
(c) show two documents that conform to the DTD. 

Suppose that a security policy allows one to insert and delete G elements for- 
bids one from replacing an H with an / element. It is straightforward to see that the 
forbidden operation can be simulated by first deleting the G element with an H child 
and then inserting a G element with an / child. There are different ways of fixing this 
inconsistency: either (a) io allow all operations below element G or (b) forbid one of 
the insert and delete operations at node G. 

Now, suppose that the policy allows one to replace an A-element with a B-element 
and this with a J-element, hut forbids the replacement of A with J elements. The latter 
operation can be easily simulated by performing a sequence of the allowed operations. 
As in the previous case, the repairs that one can propose are (a) to allow the forbidden 
replace operation or (b) forbid one of the allowed operations. 

Our contributions: In this paper we consider policies that are defined in terms of 
non-recursive structured XML DTDs as introduced in [10] that capture without loss of 
generality more general non-recursive DTDs. We first consider total policies in which 
all allowed or forbidden privileges are explicitly specified. We define consistency for 
such policies and prove the correctness of a straightforward polynomial time algorithm 
for consistency checking. We also consider partial policies in which privileges may be 
omitted. Such a policy is consistent if it can be extended to a consistent total policy; 
there may be many such extensions, but we identify a canonical least-privilege consis- 
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tent extension, and show that this can be found in polynomial time (if it exists). Finally, 
given an inconsistent (partial or total) policy, we consider the problem of finding a "re- 
pair", or minimal changes to the policy which restore consistency. We consider repairs 
based on changing operations from allowed to forbidden, show that finding minimal 
repairs is NP-complete, and provide heuristic repair algorithms that run in polynomial 
time. 

The rest of this paper is structured as follows: in Section 2 we provide the definitions 
for XML DTDs and trees. Section 3 discusses /) the atomic updates and //) the access 
control policies that we are considering. Consistency is discussed in Section 4; Section 5 
discusses algorithms for detecting and repairing inconsistent policies. We conclude in 
Section 6. Proofs of theorems and detailed algorithms can be found in the Appendix. 

2 XML DTDs and Trees 

We consider structured XML DTDs as discussed in [10]. Although not all DTDs are 
syntactically representable in this form, one can (as argued by [10]) represent more 
general DTDs by introducing new element types. The DTDs we consider here are 1- 
unambiguous as required by the XML standard [4] . 

Definition 1 (XML DTD). Let C be the infinite domain of labels. A DTD D is rep- 
resented by {Ele, Rg, rt) where /) Ele C £ is a finite set of element types ii) rt is a 
distinguished type in Ele called the root type and Rg defines the element types: that 
is, for any A <E Ele, Rg{A) is a regular expression of the form: 

RgiA) :=str | e | Bi,S2,...,B„ | + + . . . + B^, \ B^* 
where Bi G Ele are distinct, "+" and stand for concatenation, disjunction and 
Kleene star respectively, e for the EMPTY element content and str for text values. 

We will refer to ^ ^ Rg{A) as the production rule for A. An element type Bi that 
appears in the production rule of an element type A is called the subelement type of A. 
We write A <d B for the transitive, reflexive closure of the subelement relation. 

A DTD can also be represented as a directed acyclic graph that we caU DTD graph. 

Definition 2 (DTD Graph). A DTD graph Gd = {Vd^Sd^td) for a DTD D = 

{Ele, Rg,rt) is a directed acyclic graph (DAG) where /) Vd is the set of nodes for 
the element types in EleU {str}, ii) £d = {{A,B) \ A,B G Ele and i? is a subelement 
type of A} and rjj is the distinguished node rt. 

Example 1. The production rules for the DTD graph shown in Fig. 1 are: 



A + B + J + K 


D - 


F* 


G - 


^H + I 


H - 


str 


A->C,D 


B - 


-* E* 


J - 


^ G* 


I - 


> str 


C F* 


E - 


^ G* 


F - 


str 


K - 


str 



We model XML documents as rooted unordered trees with labels from C U {str}. 

Definition 3 (XML Tree). An unordered XML tree t is an expression of the form t = 
{Nt, Et, Xt,rt,vt) where /) Nt is the set of nodes ii) Et C Nt x Nt is the set of edges. 
Hi) Xt : Nt CU {str} is a labeling function over nodes iv) rt is the root of t and is 
a distinguished node in Nt and v) vt is a function that assigns a string value to nodes 
labeled with str. 
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We denote by childrent(n), parentj(n) and desct(n), the children, parent and descen- 
dant nodes, respectively, of a node n in an XML tree t. The set desc^ (n) denotes the 
edges in Et between descendant nodes of n. A node labeled with an element type A in 
DTD D is called an instance of A. 

We say that an XML tree t = {Nt, Et,Xt, n, vt) conforms to a DTD D = {Ele,Rg, 
rt) at element type A if i) rt is labeled with A (i.e., \t{rt) = jA) ii) each node in Nt is 
labeled with either an Ele element type B or with str. Hi) each node in t labeled with an 
Ele element type B has a list of children nodes such that their labels are in the language 
defined by Rg{B) and iv) each node in t labeled with str has a string value (vt{n) is 
defined) and is a leaf of the tree. An XML tree t is a valid instance of the DTD D if rt 
is labeled with rt. We write In {A) for the set of valid instances of D at element type 
A, and Id for loirt). 

Definition 4 (XML Tree Isomorpliism). We say that an XML tree ti is isomorphic to 
an XML tree t2, denoted ti = <2, iff there exists a bijection h : Nt-^ Nt^ where: i) 

h{rtj = rt2 ii) if {x,y) S Et^ then {h{x),h{y)) € Et^, Hi) Xt^ix) = Xt^ihix)), and 
iv) (x) = {h{x)) for every x with Xt^ (x) = str = At^ {h{x)). 

3 XML Access Control Framework 
3.1 Atomic Updates 

Our updates are modeled on the XQuery Update Facility draft [7], which considers 
delete, replace and several insert update operations. A delete(ri) operation will delete 
node n and all its descendants. A replace(n, t) operation will replace the subtree with 
root n by the tree t. A replace(n, s) operation will replace the text value of node 
n with string s. There are several types of insert operations, e.g., insert into(n,t), 
insert before(n,t), insert after(n,t), insert as first(n,t), insert as \ast{n,t). Update 
insert into(n, t) inserts the root of t as a child of n whereas update insert as first(n, t) 
(insert as last(n, t)) inserts the root of t as a first (resp. last) child of n. Update oper- 
ations insert before(n, t) and insert after(n, t) insert the root node of t as a preceding 
and following sibling of n resp.. 

Since we only consider unordered XML trees, we deal only with the operation 
insert into(n, t) (for readability purposes, we are going to write insert(77,, t)). Thus, in 
what follows, we will restrict to four types of update operations: delete(n), replace(n, t), 
replace(n, s) and insert(7i, t). 

More formally, for a tree ti = [Nt-^^Et^, Xt^, rt^, Wtj), a node n in ti, a tree t2 
= {Nt2, Et2, Xt2, rt2, vt^) and a string value s, the result of applying insert(n, t2), 
replace(n, t2), delete(n) and replace(n, s) to ti, is a new tree t = {Nt,Et, Xt,rt, vt) 
defined as shown in Table 1 . We denote by |op] (t) the result of applying update opera- 
tion op on tree t. 

An update operation insert(n, ^2), replace(n, 12), replace(n, s) or delete(n) is valid 
with respect to tree ti provided n G Nt-^ and t2, if present, does not overlap with ti (that 
is, A'tj CiNt^ = 0). We also consider update sequences op\ ; . . . ; op„ with the (standard) 
semantics \opi\ . . . ;op„|(ti) = |op„](|op„_i](- • • [opiK^i))). A sequence of updates 
opi; . . . ; opn is valid with respect to if for each i e {1, . . . , rt}, opi+i is valid with 
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Table 1. Semantics of update operations 



respect to ti, where ti = |opi](to), ^2 = IoP2](ii), etc. The resuh of a vaUd update (or 
vaUd sequence of updates) exists and is unique up to tree isomorphism. 

3.2 Access Control Framework 

We use the notion of update access type to specify the access authorizations in our 
context. Our update access types are inspired from the XAcU°""°* language discussed 
in [12]. Authors followed the idea of security annotations introduced in [10] to specify 
the access authorizations for XML documents in the presence of a DTD. 

Definition 5 (Update Access Types). Given a DTD D, an update access type (UAT) 
defined over D is of the form {A, insert(i3i)), {A, replace(_Bi, B2)), {A, replace(str, 
str)) or {A, delete(i?i)), where A is an element type in D, Bi and B2 are subelement 
types of A and Bi ^ B^- 

Intuitively, an UAT represents a set of atomic update operations. More specifically, for 
t an instance of DTD D, op an atomic update and uat an update access type we say that 
op matches uat on t (op matches^ uat) if: 

Xtin)=A t'eloiB) Xt(n)=B At(parentt(n)) = A 

insert(7i, t') matchest (A, insert(_B)) delete(n) matchest (A, delete(_B)) 
\t(n) =B,t' e IpjB'), Xtiparenttjn)) = A, B B' 
replace(n, t') matchesf (A, replace(_B, _B')) 

At(n) = str, At(parentj(n)) = A 
replace(n, s) matchest {A, replace(str, str)) 

It is trivial to translate our update access types to XAcU°""°* security annotations. 
In this work we assume that the evaluation of an update operation on a tree that con- 
forms to a DTD D results in a tree that conforms to D. It is clear then that each update 
access type only makes sense for specific element types. For our example DTD, the 
update access type {A, delete(C)) is not meaningful because allowing the deletion of 
a C-element would result in an XML document that does not conform to the DTD, 
and therefore, the update will be rejected. Similar for {R, delete(A)) or {R, insert(A)). 
But, {B, delete(i?)) and {B, insert(£')) are relevant for this specific DTD. The relation 
uat valid_in D, which indicates that an update access type uat is valid for the DTD D, 
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is defined as follows: 

Rg{A) ■- Bl Rg{A) ■- Bi^ 



{A, insert(_Bi)) valid.in D {A, delete(Bi)) validjn D 
Rg{A) --str Rg{A) := Bi + ■ ■ ■ + Bn,i, j £ [l,n] i^j 



{A, replace(str, str)) valid.in D {A, replace(_Bi, Bj)) valid.in D 

We define the set of valid UATs for a given DTD D as valid(£') = {uat \ uat validJn 
D}. A security policy will be defined by a set of allowed and forbidden valid UATs. 

Definition 6. A security policy P defined over a DTD D, is represented by {A, T) 
where A is the set of allowed and T the set of forbidden update access types defined 
over D such that A C valid(£'), J- C valid(Z?) and AC] J- = %. A security policy is 
total if AiJ J- = valid(Z)), otherwise it is partial. 

Example 2. Consider the DTD D in Fig. 1 and the total policy P = {A, J-) where A is: 
(7?, replace(A,S)) (7?, replace(B, J)) (i?, replace(J, TsT)) (ii, replace(A', J)) 
(i?,replace(JC,B)) (C, insert(F)) (C, delete(F)) (D, insert(F)) 

(D,delete(F)) (F, replace(str, str)) (B, insert(F)) (B, delete(_B)) 
(F, insert(G)) (S, delete(G)) (G, replace(7, H)) ( J, insert(G)) 

(J,delete(G)) (D, insert(F)) (D, delete(F)) (ff, replace(str, str)) 

(7, replace(str, str)) {K, replace(str, str)) 

and = valid(i:') \ A. On the other hand, P = {A, 0) is a partial policy. □ 

The operations that are allowed by a policy P = {A, J-) on an XML tree t, denoted 
by |^](7), are the union of the atomic update operations matching each UAT in A. 
More formally, |^](7) = {op \ op matchestitat on 7, and uat G A}. We say an update 
sequence opi ; . . . ; opn is allowed on t provided the sequence is valid on t and opi S 
|./l](t), op2 € |^](|opi](t)), etc. ' Analogously, the forbidden operations are |.?^](7) 
= {op I op matchestua7 on t, and uat G T}. If a policy P is total, its semantics is 
given by its allowed updates, i.e. |7^](7) = |^](t). The semantics of a partial policy is 
studied in detail in Section 4. 1 . 



4 Consistent Policies 

A policy is said to be consistent if it is not possible to simulate a forbidden update 
through a sequence of allowed updates. More formally: 

Definition 7. A policy P ~ {A, J-) defined over D is consistent if for every XML tree 
t that conforms to D, there does not exist a sequence opi ; . . . ; op„ of updates that is 
allowed on t and an update opo G {J-} (t) such that: 

|opi; . . . ;op„](7) = {opoKt). 

In our framework inconsistencies can be classified as: insert/delete and replace. 

Inconsistencies due to insert/delete operations arise when the policy allows one to 
insert and delete nodes of element type A whilst forbidding some operation in some 

' Note that this is not the same as {opi, . . . , opn} C [^|(t). 
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descendant element type of the node. In this case, the forbidden operation can be sim- 
ulated by first deleting an A-element and then inserting a new A-element after having 
done the necessary modifications. 

There are two kinds of inconsistencies created by replace operations on a production 
rule A ^ Bi + ■ ■ ■ ^- Bn of a DTD. First, if we are allowed to replace Bi by Bj and 
Bj by Bk but not by Bf^, then one can simulate the latter operation by a sequence 
of the first two. Second, consider that we are allowed to replace some element type Bi 
with an element type Bj and vice versa. If some operation in the subtree of either Bi 
or Bj is forbidden, then it is evident that one can simulate the forbidden operation by a 
sequence of allowed operations, leading to an inconsistency. 

We say that nothing is forbidden below A in a policy P = { A, T) defined over D 
if for every B^ s.t. A <d Bi, {Bt,op) ^ T for every {Bi,op) e valid(L'). If A 
Bi + . . . + Bn, then we define the replace graph Qa = (Va, Ea) where i) Va is the set 
of nodes for Bi,B2, ■ ■ ■ Bn and ii) {Bi,Bj) G Va if there exists {A, replace(_Bi ,Bj)) €E 
A. Also, the set of forbidden edges of A, is £a = {{Bi, Bj) \ {A, replace(_Bi, Bj)) £ 
J^}. We say that a graph Q = (V, £) is transitive if {x, y), {y, z) £ £ then {x, z) G £. 
We write Q\ for the transitive graph of Qa- The following theorem characterizes policy 
consistency: 

Theorem 1. A policy P = {A, J-) defined over DTD D is consistent if and only if for 
every production rule: 

1. A B* in D, i/ (A, insert(i?)) G A and (A, delete(i?)) G A, then nothing is 
forbidden below B 

2. A ^ Bi + ■ ■ ■ + Bn in D,for every edge (Bi, Bj) in Q\, {Bi, Bj) ^ J- a, and 

3. A ^ Bi + ■ ■ ■ + Bn in D, if for every i G [1, . . . n], if Bi is contained in a cycle in 
Qa then nothing is forbidden below Bi. 

Proof (Sketch). The forward direction is straightforward, since if any of the rules are 
violated an inconsistency can be found, as sketched above. For the reverse direction, 
we first need to reduce allowed update sequences to certain (allowed) normal forms 
that are easier to analyze, then the reasoning proceeds by cases. A full proof is given in 
Appendix A. □ 

In the case of total policies, condition 2 in Theorem 1 amounts to requiring that the 
replace graph Qa is transitive (i.e., Qa = Qa) 

Example 3. (example 2 continued) The total policy P is inconsistent because: 

- {E, insert(G)) and {E, delete(G')) are in A, but {G, replace(7J, /)) G (condition 
1, Theorem 1), 

- (i?, replace(A, J)), (i?, replace(A, A')) and (i?, replace(_B, A')) are in T (condi- 
tion 2, Theorem 1), and 

- There are cycles in Qb involving both B and J, but below both of them there is a 
forbidden UAT, namely {G, replace(i7, /)) (condition 3, Theorem 1) 

It is easy to see that we can check whether properties 1, 2, and 3 hold for a policy using 
standard graph algorithms: 

Proposition 1. The problem of deciding policy consistency is in PTIME. 
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Remark 1. We wish to emphasize that consistency is highly sensitive to the design of 
policies and update types. For example, we have consciously chosen to omit an update 
type (A, replace(i?i, B,)) for an element type in the DTD whose production rule is ei- 
ther of the form i?* or i?i + . . . + i3„. Consider the case of a conference management 
system where a paper element has a decision and a title subelement. Suppose that the 
policy allows the author of the paper to replace a paper with another paper element, 
but forbids to change the value of the decision subelement. This policy is inconsistent 
since by replacing a paper element by another with a different decision subelement we 
are able to perform a forbidden update. In fact, the replace(paper, paper) can simulate 
any other update type applying below a paper element. Thus, if the policy forbids re- 
placement of paper nodes, then it would be inconsistent to allow any other operation on 
decision and title. Because of this problem, we argue that update types replace(Bj , Bi) 
should not be used in policies. Instead, more specific privileges should be assigned in- 
dividually, e.g., by allowing replacement of the text values of title or decision. 

4.1 Partial Policies 

Partial policies may be smaller and easier to maintain than total policies, but are am- 
biguous because some permissions are left unspecified. An access control mechanism 
must either allow or deny a request. One solution to this problem (in accordance with 
the principle of least privilege) might be to deny access to the unspecified operations. 
However, there is no guarantee that the resulting total policy is consistent. Indeed, it is 
not obvious that a partial policy (even if consistent) has any consistent total extension. 
We will now show how to find consistent extensions, if they exist, and in particular how 
to find a "least-privilege" consistent extension; these turn out to be unique when they 
exist so seem to be a natural choice for defining the meaning of a partial policy. 

For convenience, we write Ap and Tp for the allowed and forbidden sets of a 
policy P; i.e., P ~ {Ap, Tp). We introduce an information ordering P ^ Q, defined 
as C Aq and Tp C Tq; that is, Q is "more defined" than P. In this case, we say 
that Q extends P. We say that a partial policy P is quasiconsistent if it has a consistent 
total extension. For example, a partial policy on the DTD of Figure 1 which allows 
{B, insert(£')), {B, delete(£')), and denies {H, replace(str, str)) is not quasiconsistent, 
because any consistent extension of the policy has to allow {H, replace(str, str)). 

We also introduce a privilege ordering on total policies P < Q, defined as C 
Aq; that is, Q allows every operation that is allowed in P. This ordering has unique 
greatest lower bounds P l\ Q defined as [Ap fl Aq^J-p U J-q). We now show that 
every quasiconsistent policy has a least-privilege consistent extension pt; th^t is, pt jg 
consistent and P^ < Q whenever Q is a consistent extension of P. 

Lemma 1. If Pi, P2 are consistent total extensions of P^ then Pi A P2 is also a consis- 
tent extension of Pq. 

Proof. It is easy to see that if Pi,P2 extend Pq then Pi A P2 extends Pq. Suppose 
Pi A P2 is inconsistent. Then there exists an XML tree t, an atomic operation opQ G 
[•^PiAP2l(*)' ^ sequence op allowed on i by Pi A P2, such that |opo](i) — |op](t). 
Now ApiAP2 = -4pi n ^P2 , so opo must be forbidden by either Pi or P2 . On the other 
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hand, op must be allowed by both Pi and P2, so t, opo, op forms a counterexample to 
the consistency of Pi (or symmetrically P2). □ 



Proposition 2. Each quasiconsistent policy P has a unique <-least consistent total 
extension 

Proof. Since P is quasiconsistent, the set S = {Q \ P ^ Q,Q consistent} is finite, 
nonempty, and closed under A, so has a <-least element P^ = /\S. □ 

Finally, we show how to find the least-privilege consistent extension, or determine that 
none exists (and hence that the partial policy is not quasiconsistent). Define the operator 
T : P(valid(L»)) ^ ^(validp)) as: 

T{S) = SU{{C,uat) I B <D C,RgD{A) = B*, {(A, insert(B)), (4,delete(B))} C S} 

yj{{C,uat) I Br <D C,RgD{A) = Bi + . . . + B„, (B„ B,) G gt{S)} 
U{(A, replace(Bi, B,.)) I R9d{A) = Bi + . . . + B„, (Bi, B,.) G GtiS)} 

Lemma 2. Ifuat G T{S) then any operation opo matching uat on t can be simulated 
using a sequence of operations op allowed on t by S ( that is, such that |opo] [t) = 

Theorem 2. Let P be a partial policy. The following are equivalent: (1) P is quasicon- 
sistent, (2) P is consistent (3) T{Ap) H J-p ~ 0. 

Proof. To show (1) implies (2), if P' is a consistent extension of P, then any incon- 
sistency in P would be an inconsistency in P', so P must be consistent. To show (2) 
impUes (3), we prove the contrapositive. If T{Ap) H Tp 7^ then choose uat G 
T{Ap)r\J-'p. Choose an arbitrary tree t and atomic update op satisfying opo G |wai] [t). 
By Lemma 2, there exists a sequence op allowed by Ap on t with |op](t) = |opo](i)- 
Hence, policy P is inconsistent. Finally, to show that (3) implies (1), note that {T{Ap), 
valid(D) \ T{Ap)) extends P and is consistent provided T{Ap) HTp = 9. 

Indeed, for a (quasi-)consistent P, the least-privilege consistent extension of P is sim- 
ply pt = (r(^p),valid(i:)) \r(^p)) (proof omitted). Hence, we can decide whether 
a partial policy is (quasi-)consistent and if so find in ptime. 

5 Repairs 

If a policy is inconsistent, we would like to suggest possible minimal ways of modifying 
it in order to restore consistency. In other words, we would like to find repairs that are 
as close as possible to the inconsistent policy. 

There are several ways of defining these repairs. We might want to repair by chang- 
ing the permissions of certain operations from allow to forbidden and vice versa; or we 
might give preference to some type of changes over others. Also, we can measure the 
minimality of the repairs as a minimal number of changes or a minimal set of changes 
under set inclusion. 

Due to space restrictions, in this paper we will focus on finding repairs that trans- 
form UATs from allowed to forbidden and that minimize the number of changes. We 
believe that such repairs are a useful special case, since the repairs are guaranteed to be 
more restrictive than the original policy. 
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Definition 8. A policy P' = {A' , J-') is a repair of a policy P = {A, J- ) defined over 
a DTD D iff: i) P' is a policy defined over D, ii) P' is consistent, and iii) P' < P. 

A repair is total if JF' = valid(Z?) \ A and partial otherwise. Furthermore a repair 
P' ~ {A', J-') of P{A,J-) is a minimal-total-repair if there is no total repair P" = 
{A" , T") such that |^'| < \ A1'\ and a minimal-partial-repair ii T' = T and there is no 
partial repair P" = {A!' ,T) such that < 

Given a policy P = JF) and an integer fc, the total-repair (partial-repair) problem 
consists in determining if there exists a total-repair (partial-repair) P' = (^', JT') of 
policy P such that | ^ \ | < k. This problem can be shown to be NP-hard by reduction 
from the edge-deletion transitive-digraph problem [19]. 

Theorem 3. The total-repair and partial-repair problem is NP-complete. 

If the DTD has no production rules of the type A ^ Bi + ■ ■ ■ + i?„, then the total-repair 
problem is in PTIME. 

5.1 Repair Algorithm 

In this section we discuss a repair algorithm that finds a minimal repair of a total or 
partial policy. All the algorithms can be found in Appendix B. 

The algorithm to compute a minimal repair of a policy relies in the independence 
between inconsistencies w.rt. insert/delete (Theorem 1, condition 1) and replace (The- 
orem 1, conditions 2 and 3) operations. In fact, a local repair of an inconsistency w.rt. 
insert/delete operations will never solve nor create an inconsistency with respect to a re- 
place operation and vice-versa. We will separately describe the algorithm for repairing 
the insert/delete inconsistencies and then the algorithm for the replace ones. 

Both algorithms make use of the marked DTD graph MGd = {Go, l^-jX) where 
/X is a function from nodes in Vd to {"+", "— "} and x is a partial function from Vd 
to {-L}. In a marked graph for a DTD D and a policy P = {A, T) i) each node in the 
graph is either marked with "+" (i.e., nothing is forbidden below the node) or with a 
"— " (i.e., there exists at least one update access type that is forbidden below the node). 
If, for nodes A and B in the DTD, both {A, insert(-B)) and (A, delete(-B)) are in A 
and fJ,{A) = "— ", then x(^) = "-L"- A marked graph is obtained from algorithm 
markGraph which takes as input a DTD graph and a policy P and traverses the 
DTD graph starting from the nodes with out-degree and marks the nodes and edges 
as discussed above. 

Example 4. Consider the graph for DTD D in Fig. 2(a) and policy P ~ [A, with 
A defined in Example 2. The result of applying markGraph to this DTD and policy 
is shown in Fig. 2(b). Notice that nodes B, E and J are marked with both a "— " and 
"_L" since /) update access type (G, replace(i?, /)) is in T and //) all insert and delete 
update access types for B, E and J are in A. For readability purposes we do not show 
the multiplicities in the marked DTD graph. □ 
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Fig. 2. DTD Graph (a) and Marked DTD Graph (b) for the DTD in Fig. 1 



Repairing Inconsistencies for Insert and Delete Operations Recall that if both the 
insert and delete operations are allowed at some element type and there is some op- 
eration below this type that is not allowed, then there is an inconsistency (see Theo- 
rem 1, condition 1). The marked DTD graph provides exactly this information: a node 
A is labeled with "_L" if it is inconsistent w.rt. insert/delete operations. For each such 
node and for the repair strategy that we have chosen, the inconsistency can be mini- 
mally repaired by removing either (^4, insert(S)) or (A, delete(i?)) from A. Algorithm 
InsDelRepair takes as input a DTD graph G d and a security policy P = ( A, T) 
and returns a set of UATs to remove from A to restore consistency w.r.t. insert/delete- 
inconsistencies. 

Example 5. Given the marked DTD graph in Fig. 2(b), it is easy to see that the UATs 
that must be repaired are associated with nodes B, J and E (all nodes are marked with 
"_L"). The repairs that can be proposed to the user are to remove from A one UAT 
from each of the following sets: {{B,\nsert{E)), {B,de\ete{E))}, {(£', insert(G)), 
{E, delete(G))} and {(J, insert(G)), (J, delete(G))}. □ 

Repairing Inconsistencies for Replace Operations There are two types of inconsis- 
tencies related to replace operations (see Theorem 1, conditions 2-3): the first arises 
when some element A is contained in some cycle and something is forbidden below it; 
the second arises when the replace graph Qa cannot be extended to a transitive graph 
without adding a forbidden edge in !F. In what follows we will refer to these type of 
inconsistencies as negative-cycle and forbidden-transitivity. By Theorem 3, the repair 
problem is NP-complete, and therefore, unless P = NP, there is no polynomial time al- 
gorithm to compute a minimal repair to the replace-inconsistencies. Our objective then, 
is to find an algorithm that runs in polynomial time and computes a repair that is not 
necessarily minimal. 

Algorithm ReplaceNaive traverses the marked graph AIGd and at each node, 
checks whether its production rule is of the form A — > i?i + . . . + Bn- If this is the 
case, it builds the replace graph for A, Qa, and runs a modified version of the Floyd- 
Warshall algorithm [11]. The original Floyd- Warshall algorithm adds an edge (i?, D) to 
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the graph if there is a node C such that (B, C) and (C, D) are in the graph and {B, D) 
is not. Our modification consists on deleting either (B, C) or (C, Z?) if {B, D) G 
i.e., if there is forbidden-transitivity. In this way, the final graph will satisfy condition 2 
of Theorem 1. Also, if there are edges (i?, C) and (C, B) and /i(C) = "— ", i.e., there 
is a negative-cycle, one of the two edges is deleted. Algorithm ReplaceNaive returns 
the set of edges to delete from each node to remove replace-inconsistencies. 

Example 6. The replace graph Qc has no negative-cycles nor forbidden-transitivity, 
therefore it is not involved in any inconsistency. On the other hand, the replace graph 
Gr = (V, £), shown in Fig. 3(a) is the source of many inconsistencies. A possible 
execution of ReplaceNaive (shown in Fig. 7 in the Appendix) is: {A, B), {B, J) G £ 
but {A, J) G so {A, B) or {B, J) should be deleted, say {A, B). Now, (B, J), (J, K) 
G £ and (B, K) G T, therefore we delete either {B, J) or (J, K), say [B, J). Next, 
{K, J), (J, K) G £ and /i( J) — "— " in Fig. 2(b), therefore there is a negative-cycle 
and either {K, J) or (J, K) has to be deleted. If {K, J) is deleted, the resulting graph 
has no forbidden-transitive and nor negative-cycles. The policy obtained by removing 
{R, replace(yl, B)), {R, replace(_B, J)) and {R, replace( J, A')) from A has no replace- 
inconsistencies. □ 

The ReplaceNaive algorithm might remove more than the necessary edges to 
achieve consistency: in our example, if we had removed edge {B, J) at the first step, 
then we would have resolved the inconsistencies that involve edges {A, B), {B, J) and 

UK). 

An alternative to algorithm ReplaceNaive, that can find a solution closer to min- 
imal repair, is algorithm ReplaceSetCover, which also uses a modified version of 
the Floyd- Warshall algorithm. In this case, the modification consists in computing the 
transitive closure of the replace graph Qa and labelling each newly constructed edge e 
with a set of justifications J . Each justification contains sets of edges of Qa that were 
used to add e in Q\. Also, if a node is found to be part of a negative-cycle, it is la- 
belled with the justifications J of the edges in each cycle that contains the node. An 
edge or vertex might be justified by more than one set of edges. In fact, the number of 
justifications an edge or node might have is 0(2l^l). To avoid the exponential number 
of justifications, ReplaceSetCover() assigns at most ^ justifications to each edge or 
node, where 5 is a fixed number This new labelled graph is then used to construct an 
instance of the minimum set cover problem (MSCP) [17]. The solution to the MSCP, 
can be used to determine the set of edges to remove from Qa so that none of the jus- 
tifications that create inconsistencies are vaUd anymore. Because of the upper bound Z 
on the number of justifications, it might be the case that the graph still has forbidden- 
transitive or negative-cycles. Thus, the justifications have to be computed once more 
and the set cover run again until there are no more replace inconsistencies. 

Example 7. For 3 = 1, the first computation of justifications of ReplaceSetCover 
results in the graph in Fig. 3 (b) with the following justifications: 

J)) = {{(A, B), (B, J)}} J((J, B)) = {{(J, A'), (A, B)}} 

J{{A, A)) = {{(A, S), (i3, J), (J, A)}} J{B)) = {{(B, J), (J, A), (A, B)}} 
J((B, A)) = {{(B, J), (J, A)}} J{J) = {{(J, A), (A, J)}} 
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(a) (b) 



Fig. 3. Replace Qr (a) and Transitive Replace Graph 0j(b) 



Justifications for edges represent violations of transitivity. Justification for nodes rep- 
resent negative-cycles. If we want to remove the inconsistencies, it is enough to delete 
one edge from each set in J". □ 

The previous example shows that, for each node A, replace-inconsistencies can be re- 
paired by removing at least one edge from each of the justifications of edges and vertices 
in Q^. It is easy to see that this problem can be reduced to the MSCP. An instance of 
the MSCP consists of a universe U and a set S of subsets of U. A subset C of 5 is a set 
cover if the union of the elements in it is U. A solution of the MWSCP is a set cover 
with the minimum number of elements. 

The set cover instance associated to = (V, £) and the set of forbidden edges 
J=-A, is MSCP{g+,TA) = {U,S) fori)iY = {s \ s e J{e), e e Ta} U {s \ s G JiV), 
V G V}, and ii) S = IJeef -^(^) where X{e) = {s | s EU,e ^ s}. Intuitively, U contains 
all the inconsistencies, and the set 2{e) the replace-inconsistencies in which an edge e 
is involved. Notice that in this instance of the MSCP, the U is a set of justifications, 
therefore, 5 is a set of sets of justifications. 

Example 8. The minimum set cover instance, MSCP{Q^^ E) = ilA, S), is such that 
U = {{(A, B), {B, J), (J, K)}, {{A, B), [B, J)}, {{B, J), (J, X)}, {(J, K), {K, B)}, 
{(J, K), {K, J)}, {{K, J), (J, K)], {{B, J), (J, K), {K, B)}} and S = {I{{A, B)), 
I{{B, J)), I{{J, K)), I{{K, J)), I{{K, B))}. The extensions of X are given in Table 2, 
where each column corresponds to a set 2 and each row to an element in U. Values 1 and 
in the table represent membership and non-membership respectively. A minimum set 
cower of MSCP{g^) is C = {J{B, J),I(J, K)}, since J) covers all the elements 
ofU except for the element {{A, B), {B, J)}, which is covered by T{J, K). Now, using 
the solution from the set cover, we remove edges (i?, J) and (J, K) from Qr. If we try 
to compute the justifications once again, it turns out that there are no more negative- 
cycles and that the graph is transitive. Therefore, by removing (i?, replace(_B, J)) and 
(i?, replace( J, A')) from A, there are no replace-inconsistencies in node R. □ 

The set cover problem is MAXSNP-hard [17], but its solution can be approximated 
in polynomial time using a greedy-algorithm that can achieve an approximation factor 
of log(n) where n is the size of U [8]. In our case, n is 0{Z x |£^/e|). In the ongoing 
example, the approximation algorithm of the set cover will return a cover of size 2. This 
is better than what was obtained by the ReplaceNaive algorithm. In order to decide 
which one is better, we need to run experiments to investigate the trade off between 
efficiency and the size of the repaired poUcy. 
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Table 2. Set cover problem 



Algorithm ReplaceRepair will compute the set of UATs to remove from A, by 
using either ReplaceNaive (if 3 = 0) or ReplaceSetCover (if Z > 0). 

Computation of a Repair Algorithm Repair computes a new consistent policy P' = 
{A' , T') from P = {A, J-) by removing from A the union of the UATs returned by algo- 
rithms InsDelRepair and ReplaceRepair. If argument total of algorithm Repair 
is true, then the repair returned by it will be total. If false, then a partial policy such 
that T' = T will be returned. 

Theorem 4. Given a total (partial) policy P, algoritizm Repair returns a total (par- 
tial) repair of P. 

6 Conclusion 

Access control policies attempt to constrain the actual operations users can perform, but 
are usually enforced in terms of syntactic representations of the operations. Thus, poU- 
cies controlling update access to XML data may forbid certain operations but permit 
other operations that have the same effect. In this paper we have studied such incon- 
sistency vulnerabilities and shown how to check consistency and repair inconsistent 
policies. This is, to our knowledge, the first investigation of consistency and repairs 
for XML update security. We also considered consistency and repair problems for par- 
tial policies which may be more convenient to write since many privileges may be left 
unspecified. 

Cautis, Abiteboul and Milo in [5] discuss XML update constraints to restrict in- 
sert and delete updates, and propose to detect updates that violate these constraints by 
measuring the size of the modification of the database. This approach differs from our 
security framework for two reasons: a) we consider in addition to insert/delete also re- 
place operations and b) we require that each operation in the sequence of updates does 
not violate the security constraints, whereas in their case, they require that only the 
input and output database satisfies them. 

Minimal repairs are used in the problem of returning consistent answers from incon- 
sistent databases [1]. There, a consistent answer is defined in terms of all the minimal 
repairs of a database. In [3] the set cover problem was used to find repairs of databases 
w.rt. denial constraints. 

There are a number of possible directions for future work, including running ex- 
periments for the proposed algorithms, studying consistency for more general security 
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policies specified using XPath expressions or constraints, investigating the complexity 
of and algorithms for other classes of repairs, and considering more general DTDs. 
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A Proofs 



A.l Proofs from Section 4 

In this appendix we outline a detailed proof of correctness for our characterization of 
policy consistency (Theorem 1). The proof is not deep, but requires considering many 
combinations of cases. The main difficulty is in proving that rules 1, 2, and 3 imply 
consistency, since this involves showing that for a consistent policy, there is no way 
to simulate a single forbidden operation via a sequence of allowed operations. The 
obvious approach by induction on the length of the allowed sequence does not work 
because subsequences of the allowed sequence do not necessarily continue to simulate 
the denied operation. 

The solution is to estabUsh the existence of an appropriate normal form for update 
sequences, such that (roughly speaking): 

1 . The normal form of an update sequence a applied to input t is 

delete(ni); • • ■ ; delete(?ii); r; insert (Zi, wi), . . . , insert(Zj, Wj) 

consisting of a sequence of deletes, then replacements, then inserts 

2. The replacements r can be partitioned into "chained" subsequences Yi, . . . that 
of the form r7 = replace(mi, u\); replace(r'„i , Mj); • • • . 

3. Each Hi, rrij, Ik is in t. 

4. No deleted or replaced node {rii or rrij) is an ancestor of another of the modified 
nodes (nt,mjjk) 

5. Allowed update sequences have allowed normal forms. 

Pictorially, a normalized update sequence can be visualized as a tree with some of its 
nodes "annotated" with insertion operations insert(u), deletions delete, and replace- 
ment sequences replace(iii, . . . , it„), such that no annotation occurs below a node with 
a delete or replace annotation. Such annotations can be viewed as instructions for how 
to construct |a] (t) from T. 

Normalized update sequences are much easier to analyze than arbitrary allowed 
sequences in the proof of the reverse direction of Theorem 1 . 

We introduce some additional helpful notation: write 

node(delete(n)) = n 
node{\nsert{n,u)) = n 
node(rep\ace{n,u)) = n 

for the "principal" node of an operation; write <t for the ancestor-descendant ordering 
on t (that is, E*)\ write Lt for the relation {(n, m) G Nt x Nt \ n ™ and m n} 
(that is, n ±t m means n and m are <i-incomparable). 

Proposition 3. Let P be a security policy and a an allowed update sequence mapping 
t to t' . Then there is an equivalent allowed update sequence a' that is in normal form. 

Proof. We first note that the laws in Figures 4, 5, and 6 are valid for rewriting update 
sequences relative to a given input tree t. We write op = op' to indicate that the (partial) 
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insert(n, u); insert(m, v) 
insert(n, u); replace(m, v) 

insert(?i, u); delete(m) = < 



insert(n, [insert(m, i7)](it)) if m £ Nu 
insert(m, ij); insert(n, ii) if m ^ Nu 

replace(m,w) ifn£Nt,m<tn 
replace(m, v); insert(n, u) if n G Nt, m n 
insert(n, i)) if m = ru 

insert(n, [replace(m, w)](u)) if m G Nu — {ru} 

' delete(m) if m <t n 

delete(m); insert(n, u) if m £ Nt, m n 
e if 771 = ru 

_ insert(n, [delete(m)](ii)) if m £ Nu — {ru} 



replace(n, u); delete(m) 
delete(n); delete(m) 



Fig. 4. Moving inserts forward 

!delete(m) if m <t n 

delete(m); replace(n, u) if m £ N, m n 
delete(n) ifm — ru 

replace(n, [delete(m)](M)) if m £ A*'u — {r^} 

J delete(7Ti) ifm<tn 
y delete(m); delete(n) if m n 

Fig. 5. Moving deletes backward 



replace(n, it); replace(m, v) 



replace(m, v) if m <t n 

replace(7Ti, v); replace(n, u) if m £ Nt, m <t n 
replace(n, [replace(7Ti, w)|(m)) if m £ Nu — {ru} 



Fig. 6. Chaining and commuting replacements 
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functions |op](— ) and |[opT|(— ) are equal; that is, for any tree t, op is valid on t if and 
only of op' is valid on t, and if both are valid, then |op] (t) = foj/j (t). 

We can use these identities to normalize an update sequence as follows. First, move 
occurrences of inserts to the end of the sequence. Next, move deletes to the beginning 
of the sequence. Finally, we use the remaining rules to eliminate dependencies among 
deletes, replacements and inserts, and to build chains of replacements. The resulting 
sequence is in normal form. 

Note that most of the identities only rearrange existing allowed updates and do not 
introduce any new update operations that we need to check against the policy. In a few 
cases, we need to do some work to check that the rewritten sequence is still allowed. 
For example, when we rewrite replace(n, u); delete(m) to delete(n) with m = r„, we 
need to verify that we are allowed to delete m; this is because we were allowed to delete 
n, which replaced m. 

We say that two trees agree above n if the trees are equal after deleting the subtree 
rooted at n from each. Note that for all of the operations we consider, if op has principal 
node n and op is valid on t then t agrees with |op] (t) above n. 

Lemma 3. If t and t' are equal except under the subtree starting at n, and allowed 
sequence a maps t to t' , then there is an equivalent, normalized, allowed sequence a' 
that only ajfects nodes at or above n. 

Proof. We show that for each node m unrelated to n, updates applying directly to m 
can be eliminated. If a deletion applies to m, then must be an insertion replacing the 
deleted subtree exactly, and these are the only updates affecting m. Thus, it is safe to 
remove this useless deletion-insertion pair. If a replacement applies to m, then there 
must be subsequent replacements that restore the subtree at m. This sequence of re- 
placements can be eliminated. No other possibilities are consistent with t and t' being 
equal except at n. Thus, by considering each node m in the tree that is unrelated to n, 
and removing the updates having an effect on m, we can obtain an equivalent update 
sequence a' having only updates whose principal node is related to n. This update se- 
quence is still allowed since we have only removed allowed operations (and since all of 
the operations we have removed are independent of the remaining ones), and can also 
be further normalized if necessary. 

If t, t' agree above n, and a is an allowed sequence, then we define the n-related nor- 
mal form of a to be an equivalent allowed, normalized sequence of operations affecting 
the tree above or below n, which must exist by the above lemma. 

Proof of Theorem 1. For the forward direction, we prove the contrapositive. As argued 
in Section 4, any violations of the above properties suffice to show that a policy is 
inconsistent. 

For the reverse direction, we again prove the contrapositive. Suppose P is inconsis- 
tent, and let t be a tree, a a sequence allowed on t, and d denied on t by P, such that 
|a] [t) = \d\ (t). We consider the four cases for d: 
- d = insert(n, t). Consider the normal form of the a restricted to the updates related 
to n. Clearly a cannot consist only of updates at or below n since an insertion at 
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n cannot be simulated by a deletion or replacement at n or by any operations that 
only apply below n. If there is a deletion above n, there must also be an insertion 
above n that restores the extra deleted nodes and also has the effect of insert(n, t). 
Hence there is a violation of rule 1 . Otherwise, if there is a replacement above node 
n, then there must be one or more replacements restoring the rest of the tree to 
its previous form and inserting t, violating rule 3 (since the chain of replacements 
must be allowed by a cycle in some graph Qj^) 

- d = delete(n, t), replace(n, s). Similar to case for insert, since again these opera- 
tions cannot be simulated solely by operations at or below n. 

- d = replace(?i, v). There are two possibilities. If the 72-related normal form of a 
consists only of replacements at n, then the policy must violate rule 2. Otherwise, 
an argument similar to that in the above cases can be used to show that P must 
violate rule 1 or 3. □ 

Proof of Proposition 1. By Theorem 1, there are two cases in which a policy can be 
inconsistent. The first case can be checked by doing a traversing of the graph following 
a topological sorting of the DTD graph. This can be done in polynomial time over the 
number of edges and vertices of the DTD graph. 

The second case consists of checking if the graphs Qa are acyclic and transitive. 
Checking this two conditions for each element A can be done in polynomial time. □ 



Proof of Lemma 1. Since both P and Q extend R, we have Ap,Aq 3 Ar and 
Dp, Vq D T>]i; hence 



Proof of Lemma 2. By cases according to the definition of T. If uat G S then there is 
nothing to do. 

If for some A, B we have uat = (C, op) with B <d C, with production rule A 
—>■ B*, {{A, insert(i?)), {A, delete(i?))} C 5", then let n = node{opo), let m be the 
B-labeled node above m in t (there must be exactly one), and let t' be the subtree of 
t rooted at m. We can simulate opo by deleting the _B-labeled subtree to which opo 
applies, then inserting the tree resulting from applying opo; thus, the sequence op = 
delete(m); insert(n, |opol(i') simulates opo and is allowed. 

If for some A, B we have uat = (C, op) with Bi <o C, RgniA) = i?i + . . . + 
Bm {Bi, Bi) (1 5^(5), then let Bi-^ , . . . , Bi,, be a cycle in Qa beginning and ending 
with Bi. Again let n =- node{opo), m be the (unique) i?i-labeled node above n, and t' 
be the subtree of t rooted at m. Let ti, . . . , t^-i be arbitrary trees disjoint from t and 
satisfying tj G loiBi ). (The latter sets are always nonempty so such trees may be 
found.) Now consider the update sequence 

op = replace(m,ti); replace(rttj , t2); • ■ ■ ; rep\ace(rtt^_2,t„-i); replace(rtt„_i, [opol(i')) 
This update sequence is allowed on t and simulates opo- 



ApDAq^ Ap f) Ar = Ar 
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Finally, if for some Bi, . . . , i?„ we have uat = (C, replace(i?i, Bj)), where Rgo{C) = 
Bi + . . ■ + B„, {Bi, Bj) G Gci^) ^^^^ " ~ node{opo), let t' be the subtree rooted at 
n. Let Bi-^ , . . . , be a sequence of nodes forming a path from Bi = Bi^ to Bj = 
in Qc, and choose ti, . . . , i^^i satisfying ti G l£i{Bi^). Then the update sequence 

op = replace(n,ti); replace(rtti , t2); • ■ • ; replace(rtt^_2 , replace(rit„_i, [opo\{t')) 

again is allowed and simulates opo- 1^ 

A.2 Proofs from Section 5 

Proof of Theorem 3. We will concentrate on the total-repair problem. The proof for 
partial-repair problem is analogous. 

First we will prove that the total-repair is in NP. We can determine if there is a 
repair P' ~ {A! , J-') of P such that |^ \ | < k, by guessing a policy P' , checking if 
|yl \ ^'1 < k and if it is consistent. Since consistency and the distance can be checked 
in polynomial time, the algorithm is in NP. 

To prove that the problem is NP-hard, we reduce the edge-deletion transitive-digraph 
problem which is NP-complete [20, 19]. The problem consists in, given a directed graph 
Q = (V, £) with V = {vi, . . . , Vn} and E a set of edges without self-loops, determine 
if there exists a set Q' = (V, £') such that E' C E, Q' is transitive and \E\E'\ < k. 
Now, let us define a DTD D and a policy P. The production rules of D are: 
vi -\ h w„ 

Vi str for i e [i,n] 
The policy P = {A,J-) is such that A = {{A, rep\ace{vi, Vj))\{vi, Vj) G E} U {(wi, 
replace(str, str)) \ Vi G V} and T = valid(_D) \ A. It is easy to see that Qj^ ~ Q and 
therefore finding a repair will consist on finding the minimal number of edges to delete 
from Q to make the graph transitive. □ 

Proof of Theorem 4. Given an inconsistency policy P = [A, , Let us assume, by 
contradiction, that the policy P' = (^', J^') returned by algorithm Repair is not a 
repair. Since P' is defined over D, and by construction P' < P, this implies that P' is 
not consistent. Then, it should be the case that either the changes returned by: 
1 . InsDelRepair do not solve all the insert/delete-inconsistencies . This implies that 
there is a node A with production rule A B* such that {A, insert(i?)) e A', 
{A, delete(i?)) G A' and there is at least one forbidden UAT, say (C, op), such 
that B <D C. Since P' < P, (A, insert(B)) G A and (A, delete(B)) G A. If 
we prove that there is always an operation (G, op) G !F such that B <o G, the 
marked DTD graph would be such that x(^) Then, either [A, insert(i?)) or 
[A, delete(i?)) would have been in the changes returned by InsDelRepair and 
one of them wouldn't have belonged to P' . Now we will prove that such (G, op) 
always exists. If (G, op) G J-, then, (G, op) = (G, op). On the other hand, if 
(G, op) ^ J- then (G, op) is either one of the changes returned by InsDelRepair 
or ReplaceRepair: 

(a) If (G, op) was a change returned by InsDelRepair, then there was an insert- 
delete inconsistency, and there is another UAT [F, op2) G J- such that G <d 
F. As a consequence B <£, F, and we have found (G, op). 
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(b) If (C, op) was a change returned by ReplaceRepair this would mean that 
(C, op) was either involved in a negative-cycle or forbidden-transitivity. The 
former implies there is another UAT (F, op2) G T such that C <d F. Then, B 
<D F, and we have found (G, op). The latter case implies there is at least one 
other (C, op2) G F. We have found (G, op). 
2. ReplaceRepair do not solve all the replace-inconsistencies: This implies that 

there is a node A with production rule A ^ Bi + ■ ■ ■ + Bn such that one of the 

following holds: 

(a) There is an edge {B,,Bj) in for P', s.t. (B^B-j) G T'^. If G 
J^vi, then ReplaceRepair would have deleted at least one edge from each 
justification of {Bi, Bj), and therefore, {Bi, Bj) could not be in for P'. 
On the other hand, if {Bi,Bj) ^ Ta, then (A, replace(i?i, Bj)) it implies 
that it was part of the changes returned by ReplaceRepair. Since both, 
ReplaceNaive and ReplaceSetCover check that the final graph has no 
forbidden-transitivity, this is not possible. 

(b) There is a Bi which is part of a cycle in Qa for P' and there is a VAT (G, op) G 
T' s.t. Bi <]j C. Since Bi is in a cycle in Qa for P', it should be part of a 
cycle in Qa for P. If (G, op) G then the inconsistency would have been 
solve. On the other hand, if (G, op) ^ T, then (G, op) is either one of the 
changes returned by InsDelRepair or ReplaceRepair. By an analogous 
reasoning as in cases 1(a)- 1(b), this is not possible either. 

Therefore, P' is consistent and is a repair of P. □ 

B Algorithms 



Algorithm 1 markGraph 

Input: DTD Graph Go, Policy P 

Output: Marked DTD Graph MGd = [Gd,p.,x) 

1: Let Zi, ^2, • • • 'fc be the set of nodes in Gd with out-degree=0 

2: for all Hn {/i, /2, . . . ifc} do 

3: markNode(MGD,Z,-P) 

4: return MGd 
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Algorithm 2 markNode 



Input: Marked DTD Graph MGd = (Gd, M> x). Node B, Policy P = {A, T) 

1: for all A eVo such that {A, B) e Ed do 

2: if //(B) = "-" then 

3: m(^) ^ "-" 

4: else 

5: /* fi{B) is undefined */ 

6: if [a, insert(_B)) £ T ov {A, delete(B)) G or [A, replace(B, B')) e T then 

7: KB) ^ m(^) ^ 

8: else 

9: = "+" 

10: if ^(A) = "-"then 

11: if (yl,insert(_B)) £ y4 and (A, delete(_B)) G then 

12: x(^) ^ 

13: markNode(yl) 



Algorithm 3 InsDelRepair 

Input: DTD graph Gd, security policy P 

Output: Set of UATs to remove from P to restore consistency in P w.r.t. insert/delete- 
inconsistencies 
1: MGd ^ markGraph(GD, P) 

2: changes ^ 

3: for all AsVd and {A, B) G Ed do 
4: if x(^) = "-L"then 

5: Randomly choose either [A, insert(_B) or [A, delete(_B)) and assign it to U 
6: changes <— changes U U 

7: return changes 



Algorithm 4 ReplaceRepair 

Input: DTD graph Gd, security policy P = {A, J-), Maximum Number of Justifications 3 
Output: Set of UATs to remove from A to restore consistency in P w.r.t. replace-inconsistencies 

1: MGd ^ markGraph(GD, P) 

2: if3 = Othen 

3: Sol ^ ReplaceNaive(ri3, AfGn) 
4: else 

5: Sol ^ ReplaceSetCover(rD, AfGi5, J) 

6: changes ^ 

7: for all (4, C) G Soldo 

8: for all (B, G) G C do 

9: changes ^ changes U (A, replace(i?, G)) 

10: return changes 
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Algorithm 5 ReplaceNaive 



Input: Node R, Marked Graph MGd 

Output: Set Sol containing pairs {B, C) wliere B is a node reacliable from R in MGd, and C 
a set of edges to delete from t/s to make it consistent 
1: if Rg{R) := Bi + B2 • . . + B„ then 
2: Let Qa be the replace graph for R 
3: C ^ 

4: Let stack S contain all the nodes in c 
5: while 5 not empty do 
6: B S.popi) 

7: for all A in Vr, s.t. [A, B) g£r\C do 
8: for all C £ Vr, s.t. (B, C)€£r\C do 

9: /* If there is an edge missing for transitive or if there is a cycle over a node with 

a \J AT forbidden below */ 
10: it A ^ C 01 fi{A) = "-" then 

11: Let e be one of {A, B), (B, C) (chosen randomly) 

12: C = CU {e} 

13: if e = (^,B) then 

14: A 
15: else 
16: G = B 

17: for all F G s.t. F is reachable from G in Qr do 

18; S.push(F) 

19: SoZ <- {(i?,C)} 

20: else 

21: Sol^-Hl 

22: for all (iJ, B) G do 

23 ; Sol <- Sol U ReplaceNaive(B , MG d ) 

24: return Sol 
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Algorithm 6 ReplaceSetCover 

Input: Node R, marked DTD graph AIGd, forbidden edges Tr, integer 3 
Output: Set Sol containing pairs {B, C) whiere B is a node reacliable from R in AIGd, and C 
a set of edges to delete from Gb lo make it consistent 

1: Sol ^9, C ^ 0, done^ false 

2: if Rg{R) := Bi + B2 • ■ ■ + B„ then 

3: Let Qr — (V, £) be the replace graph for R 

4: Qr 

5: while -^done do 

6: ^ ComputeJustifications(C/, 3) 

7: Algorithm setCoverAlg takes the graph C7+ with the justifications and the set of 

forbidden edges and returns the edges to delete from Qa */ 

8: £sc ^ setCoverAlg(e+, J-fl) 

9: if£:sc/0then 
10: remove edges in £sc from Q 

11: C^CVJEsc 
12: else 
13: done = true 

14: Sol ^ SolVj{{R,C)} 
15: for all (iJ, B) G £rAo 

16: So/ ^ So/ U ReplaceSetCover(B, MGd) 
17: return So/ 
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Algorithm 7 ComputeJustifications 



Input: Replace Graph Qb., Maximum Number of Justifications 

Output: Q'^, i.e., the transitive closure of Qr witli each edge and node labelled with a set J 
containing at most 3 justifications 

for all (A, B) e £r do 

J{{A,B))^{{{A,B)}} 
for all A£VrAo 

J{A) = 
for all A in Vr do 

for all B in Vr, s.t. [A, B) £ £r\J E Ao 
for all C G Vr, s.t. {B, C)££r\JEAo 

/* If there is an edge missing for transitivity */ 
if {A, C) ^ERdXidA^C then 
if {A, C)^E then 

£^ £u{(A,c)} 

J{{A,C))^ib 
for all ji G J((A,B))do 
for all j2 G J{{B,C)) do 
if |J((A,C))| <3then 

J{{A,C))^JiiA,C))u{jiUj2} 
/* If there is a cycle */ 



if A = C and ^J.{A) = then 
for all ji G J{{A,B))do 
for all j2 G ^^((5,71)) do 
if \J{A)\ < 3 then 

j(A)^J{A)u{jiUj2} 
g+ ^ {Vr^ErVJE) 
return C/+ 



Algorithm 8 Repair 

Input: DTD graph Gd, security policy P = {A, J-), boolean total 

Output: A repair P' of P. The repair is total if parameter total— 1, partial otherwise. 

1: changes ^ InsDelChecking(GD, P) U ReplaceRepair(GD, P) 

2: A' ^ A — changes 

3: if total then 

4: T' ^ yaWdiD) - A' 

5: else 

6: T' 

7: P' ^ 

8: return P' 
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